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Abstract 

Statistical analysis of max-stable processes used to model spatial extremes has been limited 
by the difficulty in calculating the joint likelihood function. This precludes all standard 
likelihood-based approaches, including Bayesian approaches. In this paper we present a 
Bayesian approach through the use of approximate Bayesian computing. This circumvents 
the need for a joint likelihood function by instead relying on simulations from the (un- 
available) likelihood. This method is compared with an alternative approach based on the 
composite likelihood. When estimating the spatial dependence of extremes, we demonstrate 
that approximate Bayesian computing can provide estimates with a lower mean square error 
than the composite likelihood approach, though at an appreciably higher computational cost. 
We also illustrate the performance of the method with an application to US temperature 
data to estimate the risk of crop loss due to an unlikely freeze event. 
Keywords: approximate Bayesian computing, composite likelihood, extremal coefficient, 
likelihood-free, max-stable process, spatial extremes. 

1. Introduction 

Modeling of spatial extremes is motivated by the need to model and predict environmental 
extreme events such as hurricanes, floods, droughts, heat waves, and other high impact 
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events. Though the data have a natural spatial domain, standard spatial statistics methods 
may fail to accurately model extremes. Models specifically designed for extremes are better 
suited. The urgency of focusing on extremes is increased when one considers the potential 
influence of climate change on the probability of such high impact events. We consider point 
referenced data, usually taken as daily or hourly measurements y tt d at locations d — 1, D 
for time points t = 1, T. When modeling extremes, as a first step one takes block maxima 
over some temporal block (usually one year) and obtains block maxima data where i is 
the block. In the environmental setting, the data would typically be annual maxima at each 
of D locations. 

For a single location, univariate extreme value theory pro yides a full range of tools to ana- 



lyze the data. This theory is w e 



2006 



Embrechts et. al. 



1997 



1 developed and docu mented (jColesl . 



Resnickl . 



1987 



2001 



de Haan and Ferreira . 



20071 ). When one considers several locations 



at once, multivariate extreme value theory is a natural extension. Multivariate models of- 
ten work well for lower dimensions, but if the data have a natural spatial domain and the 
dimension grows rapidly, spatial extreme value theory becomes useful. Spatial extremes 
are the infinite dimensional generalization of multivariate extremes. The goal then is to fit 
these block maxima data to a spatial process model so that the spatial dependence may be 
estimated. One promising class of models are max-stable processes. These processes arise 
as the limiting distribution of the maxima of independent and identically distributed ran 
dom fields. A number o f max-stable process models have been described ( jSchlather . 



2002 



Kabluchko et. al. 



20091 ) and one unpublished model was described by Smith in 1990. The 



statistical analysis of these models is limited by the unavailability of the joint likelihood func- 
tion. However, the bivariate distributions are available in closed-form. This allows one to 
write down the pairwise log-likelihood, which is the sum (taken over all unique pairs of loca- 
tions) of all bivariate log-likelihoods, and is thus also a composite log-likelihood. Numerical 
maximization of the composite li kelihood yields estimates of the param eters which are consis- 



tent and asymptotically normal (IPadoan et. al.L 12010 ; 



Lindsay! . 



19881 ). Maximum composite 
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likelihood estimation has been the only method so far for analyzing max-stable processes 
which is widely applicable, implemented computationally (R package Spatial Extremes), 
and for which a viable asymptotic theory exists. 

In this paper we develop a Bayesian alternative for analyzing the dependence of spa- 
tial extremes. It circumvents the need for the joint likelihood, and instead relies only on 
simulations. This approach, termed approximat e Bayesian computin g, has been successfully 



applied in many areas, including extreme values ( jBortot et. al. 



20071 ) . We show three imple- 



mentations of the approximate Bayesian computing approach for analyzing spatial extremes. 
The first two rely on the bivariate distribution function, and like the composite likelihood 
approach they consider the spatial dependence through all unique pairs of locations. The 
third and most successful approach extends beyond pairs, and is able to consider higher 
order k-tuples for k > 3. This feature is an important benefit of the approximate Bayesian 
computing approach over all pairwise approaches. 

We show that the approximate Bayesian computing method can result in a lower mean 
square error compared to the competing composite likelihood approach when estimating 
the spatial dependence. We also discuss how this Bayesian approach naturally incorporates 
parameter uncertainty into predictions, which is a central task in the field of extremes. The 
method is computationally intensive, but the implementation based on independent sampling 
can be carried out in parallel, and we demonstrate the use of a more computationally efficient 
adaptive implementation as well. 



2. Extremes 

2.1. Univariate and Multivariate Extremes 

Let Yi, Y n be independent and identically distributed univariate random variables with 
some distribution function F and let M n = max(Y 1 , Y n ) be the maximum. If M n converges 
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to a non-degenerate distribution under renormalization as 

= F n (a n y + b n ) — > G(y) as n — > oo 



pr 



M n -b n 



< 



for some sequences a n > and b n , then G must be a member of the Generalized Extreme 
Value family, with distribution function 



G{y) = expt- 1 + e 



a 



Here a + = maxfa^ O ), and /i, a > 0, and £ are the location, scale, and shape parameters, 
respectively (IColesl . 1200 fl ). The sign of the shape parameter £ corresponds to the three 
classical extreme value distributions: £ > is Frechet, £ < is Weibull, and £ — » (in 
the limit) is Gumbel. The Frechet case corresponds to a heavy tailed distribution, Gumbel 
is intermediate, and Weibull has a bounded upper limit. The Generalized Extreme Value 
distribution also holds for modeling minima, since one may always write min(Yi, Y n ) = 
-maxt-Yx, -Y n ). 

One property of the Generalized Extreme Value distribution is that it is max-stable: If 
Yi,...,Y n are independent and identically distributed from G, then max(Vi, Y n ) also has 
the same distribution with only a change in location and scale, as G n (y) = G(A n y + B n ) for 
constants A n and B n . A distri bution is a member of th e Generalized Extreme Value family 
if and only if it is max-stable (jLeadbetter et. all Il983l ). A special case of the Generalized 
Extreme Value family is the unit-Frechet, with distribution function G(y) = exp(— l/y). Any 
member of the Generalized Extreme Value family may be transformed to have unit-Frechet 
margins as follows: if Y has a Generalized Extreme Value distribution, then a new variable 
Z may be defined as 



Y 



a 



and Z has unit-Frechet margins. If the parameters are unknown, they may first be estimated 
and then the transformation to Z is taken. When we model multivariate or spatial extremes, 
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there is no loss in generality when one assumes the margins are all unit-Frechet. In practice, 
one would first estimate all marginal distributions and transform to unit-Frechet, then in a 
second step analyze the spatial dependence. 

We may extend this approach to handle multivariate extremes. Let (Yn, la?), i = 
l,...,n be independent and identically distributed replicates of a D— dimensional random 
vector and let M n = (M n i, M n o) be the vector of componentwise maxima, where M n d = 
max(Y" lrf , Y nd ) for d — 1, D. A non-degenerate limit for M n exists if there exist sequences 
a n d > and b n d, d — 1, D such that 

/ M nl - b nl M nD - b nD \ 
hm pr < yi, < y D = G(y u y D ). 

n-s-oo y Oni d nD J 

Then G is a multivariate extreme value distribution, and is max-stable in the sense that for 
any n > 1 there exist sequences A n d > 0, B n( i, d — 1, D such that 

G n (y 1 , y D ) = G(Auyi + B nl ,..., A nD y D + B nD ). 

The marginal distributions of a multivariate extreme value distribution are necessarily uni- 
variate Generalized Extreme Value distributions. 

2.2. Max-stable Processes 

A common method of modeling spatial extremes is through max-stable processes, which 
arise as an infinite dimensional generalization of multivariate extreme value theory. Let 
Z(x),x G X C W be a stochastic process. If for all n > 1, there exist sequences a n (x), b n (x) 
for some x±, ...,Xrj G X such that 

pr n ( Z{X i~ {x b f Xd) ^ < x *l d=l,...,D^ C.,. ,,,(:(,-,). z(x D )), 

then G xlt „^ XD is a multivariate extreme value distribution. If the above holds for all possible 
Xi,...,xd G X for any D > 1, then the process is a max-stable process. Without loss of 
generality one may assume that a max-stable process has unit-Frechet margins. In practice, 
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this is achieved by estimating the three Generalized Extreme Value parameters at each 
location x^, and then transforming the data. 

Max-stable processes are often shown in spectral representation, and are often constructed 
as follows: Let Y(x) be a non- negative stationary process on M. p such that E(Y(x)) = 1 at 
each x. Let II = {sj} ig N be a Poisson process on M + with intensity ds/s 2 . If Y^x) are 
independent replicates of Y(x), then 

Z(x) = maxSjFj(a;), xGl 



is a stationary max-stable process with unit Frechet margins (jde Haanl . Il984l ). From this, 
the joint distribution may be represented as 

pr(Z(x) < z(x),x G X) = exp < —E I sup . . 

I \ xeX z(x) 



Schlatherl (120021 ) introduced a flexible set of models for max-stable processes, termed 
extremal Gaussian processes. He considered a stationary Gaussian process Y(x) on IR P with 
correlation function p(-) and finite mean [i = E max(0,Y(x)) G (0, oo). Let be a Poisson 
process on (0, oo) with intensity measure fi~ 1 s~ 2 ds. Then 

Z(x) = max Si max(0, Yi(x)) 

i 

is a stationary max-stable process with unit-Frechet margins. The bivariate distribution 
function is 

1/2- 



w{Z\ < Zx, Z 2 < z 2 ) = exp 
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1 - 2(p(h) + 1) 
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M + z 2 . 



1 



where p(h) is the correlation of the underlying Gaussian process Y(x) and h = \\xx — x 2 \\. 
The correlation is chosen from one of the valid families of correlations for Gaussian processes. 
A few common choices are Whittle-Matern, 



r» \c 2 j \c 2 



< ci < l,c 2 > 0,u > 0, 
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Figure 1: Extremal Gaussian process with Whittle-Matern correlation and parameter <p = {ci,C2,v) = 
(1,3,1) 



Cauchy, 

p{h) = ci { 1+ ( - ) }> , 0<ci<l,Ca>0,i/>0, 
and powered exponential 

p(h) = ciexp j- f — ) [> < ci < l,c 2 > 0,0 < v < 2, 




where C\,C2 and z/ are the nugget, range, and smooth parameters, T is the gamma function 
and K v is the modified Bessel function of the third kind with order v. It is common to fix 
the nugget as c\ = 1, which forces p(h) — > c\ = 1 as h — >■ 0. This is a reasonable assumption 
for many environmental processes. Throughout the remainder of this paper, we will use the 
Bayesian notation cf) to refer to the parameters (02, v) of the correlation function, and write 
p(h) = p(h;<j)) to serve as a reminder that the target of our method is the function p(-). 
Figure [1] shows one realization of a process with the Whittle-Matern correlation function. 
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There are other classes of max-stable processes which the method developed in this paper 
could be applied to. One is the Smith class (Smith 1990, unpublished manuscript), formed 
as follows: Let (si,Xi),i > 1 denote the points of a Poisson process on (0, oo) x MP with 
intensity measure s~ 2 dsdx. Take f(x) to be the multivariate Gaussian density function, 

1 



f(x) = (2vr)- p / 2 |S|- 1 / 2 



cxp 



-—x L x 
2 



Then Z{x) = maXjSj/(x — Xj) is a max-stable process with unit-Frechet margins. Other pro- 
cesses are taken as Zi(x) = exp [e;(x) — l/2a 2 } where ej(x) is a stationary Gaussian process 
with mean and variance a 2 , or Zi(x) = exp [ei(x) — l/2a 2 (x)} where eAx) is a Gaussia n 



process with stationary increments, mean 0, and vax( e(x)) = a 2 (x) fKabluch ko et. al. 



This process is also called a Brown- Resnick process ( IBrown and Resnickl . 119771 ). We elected 



2009|). 



to concentrate on the Schlath er family because it has been widely studied in recent years 
( jBlanchet and Davisonl . 1201 ll ) and its simulations appear realistic for practical datasets. 

Two key features of max-stable processes are repeat ed. First, i t is p ossible to simulate 
max-stable processes through a point process approach (jSchlatherl . 120021 ). Second, only the 
marginal and bivariate distribution functions can be written in closed-form, so the joint 
likelihood function is unavailable. These two statements taken together motivate the use of 
approximate Bayesian computing for modeling dependence in spatial extremes. 

2.3. The Extremal Coefficient 

Let Z(x) be a stationary, isotropic max-stable random field with unit-Frechet margins. 
For D fixed locations, the joint distribution function can be written as 

pr(Z(xi) < z x , Z(x D ) < z D ) = exp {-V{z x , z D )} 



where V(z x , zjj) is the exponent measure first described by lPickandd (1l98lh . Max-stability 
implies that for all N, 



W {Z X < Zl , Z D < z D ) N = exp{-NV(z 1} .., z D )} = exp{-V( Zl /N } z D /N)}. 



This is the homogeneity property of the exponent measure. For D < 2 locations this function 
V(-) can be written out explicitly, but for D > 3 it cannot. If we further consider the joint 
distribution of D locations evaluated at the same value z, we get 

pr(Z(xi) < z, Z(x D ) < z) = exp j- ^ Xu "^ ,Xn ^ 

where 9(xi, ...,xd) = V(l, 1) is the extremal coefficient for the D locations. The coefficient 
takes values between 1 and D, with a value of 1 corresponding to complete dependence 
among the locations, and a value of D corresponding to complete independence. The value 
can be thought of as the number of effectively independent locations among the D under 
consideration. 

Many results are available for the pairwise extremal coefficient, which arises when con- 
sidering any pair of locations, 

6(xi,x 2 ) 



Pr(Z(xi) < z, Z{xz) < z) = exp 

Since the bivariate distribution functions are available in closed form equation ([1]) for the 
Schlather process, one may write out the pairwise extremal coefficients explicitly as 

m = i + p-y*>r ( 2) 



where h — \\xi — X2\\- One may estimate the pairwise extremal coefficients directly from the 
data, and then through th ose estimates obtai n an estimate of p(-). Smith (Smith 1990, un- 



published manuscript) and I Coles et. al.l (Il999f ) proposed an estimate of the pairwise extremal 
coefficients as follows. First, assume that the field Z(-) has been transformed to unit-Frechet. 
This means that 1/Z(-) is unit exponential, and l/max(Z(xi), Z(x 2 )) is exponential with 
mean l/9(xi, £2). A simple estimator then is 

TL 

9{Xi,X2) = — -, -, -, r - ( rr (3) 

2^ i=1 l/m&x(z i {x 1 ),z i (x 2 )) 
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where % is the index for the block. In this paper, we move beyond the pairwise extremal 
coefficient and also focus on the tripletwise extremal coefficient, which is defined for any 
triplet of locations in the relation 



pr(Z(xj) < z, Z(xk) < z, Z(xi) < z) — exp 



(4) 



Since the trivariate distribution function for Z(-) is unavailable, so too is any closed-form 
expression for 8(xj,Xk,xi). However, following the same argument as in the pairwise case, 
we may estimate the coefficients using the estimator 



9(xj,x k ,xi) 



n 



(5) 



Y™ =1 l/max(zi(xj), Zi(x k ), z^xi)) 

where i is the index for the block. These estimated triplets will serve a key function in the 
approximate Bayesian computing algorithm. This argument may be extended to estimate 
all fc-point extremal coefficients for any collection of k locations. 



3. Approximate Bayesian Computing 

Approximate Bayesian computing (ABC), also called likelihood- free computing, aims to 
approximate a posterior distribution when the true likelihood is either unavailable or com- 
putationally prohibitive. Only simulations from the (unavailable ) likelihood are needed. Ap- 



proximate Bayesian computing originated in populati on genetics (IFu and L 



1997 



Tavare et. al. 



1997 



Pritchard et. al. 



1999 



Beaumont et. 



al 



20021 ) , and has since branched out into many 



areas, including extremes ( IBortot et. all 120071 ). Our ta rget is the post e rior d istribution 
7r(0 I z) oc f(z I (P)tt(4>), where z is the observed data. ISisson and Fan! (120101 ) described 



how the ABC method facilitate the computation by introducing an auxiliary parameter z' (a 
simulated dataset) on the same space as observed data z. Thus we are actually computing 



KABc{<f>,z' I Z) OC Tl{z \ z',(fr)ll(z' | 0)vr(0). 
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Integrating out the simulated dataset yields the target posterior of interest 

kabc{4> I z ) / n(z | z',8)n(z' \ 4>)dz'. 



When ir(z \ z', 0) is exactly a point mass at the point z' — z and zero everywhere else, the 
posterior is recovered exactly. This is likely to occur with probability (for continuous data), 
or probability close to zero (for discrete but high dimensional data), so in practice the form 
is usually taken to be 



e \ e 

where K is a kernel density function, and S is a summary statistic. The intractable likelihood 
is weighted in regions where S(z') f=s S(z). When S is a sufficient statistic and in the limit 
as e — > 0, we have lim^o ~Kabc (0 \ z ) — z )- The most familiar form of this is 

tt{z I z',9) oc 1 if d(S(z'),S(z)) < e 

which occurs when K is a uniform density kernel on an interval. With these concessions, 
the most basic implementation is as follows: 

1. Draw a candidate parameter 0' ~ 7r(0) from a prior 

2. Simulate data from f(Z' | <$') 

3. Accept (p' if d(s, s') < e and return to step 1. 

This is the form shown by lMarjoram et. al.l (120031 ). using the uniform kernel (in this paper 
we focus on the uniform kernel exclusively, and drop the kernel notation K(-) in favor of the 
more straightforward acceptance notation d(S(z'), S(z)) < e). Choosing the quantities 5*, d, 
and e is necessarily a trade-off between accuracy of the approximation and computational 
efficiency. In many applications, the challenge is that the chosen summary S must be highly 
informative of the parameter, while at the same time pr(d(S(z), S(z')) < e) must be large 
enough for the entire algorithm to be computable. The algorithm produces an independent 
and identically distributed sample drawn from 7r(0 | d(S(z), S(z')) < e), and the following 
two limits show the role of the threshold e: 
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1. If e ->■ oo, then vr(0 | d{S{z), S{z')) < e) -> tt(0) 

2. If e-> 0, then vr(0 | < e) ->■ vr(0 | 5(2)) 

For large values of e, nearly all draws from the prior will lead to acceptances, thus the 
approximated posterior mirrors the prior. As the threshold is reduced, the approximation 
more closely resembles 7r(0 | S(z)). When S is a sufficient statistic, this is the exact pos- 
terior. However, 5* is unlikely to be sufficient in practice, and is often chosen to be highly 
informative of the parameter (p. The remainder of this section will discuss three particular 
implementations of approximate Bayesian computing for spatial extremes. In each case, the 
particular motivation and definition for the summary will be discussed. 

3.1. The Madogram Method 

Let Z(x) be a stationary, isotropic max-stable random field with Generalized Extreme 
Value margins with £ < 1. The madogram is defined as: 

m{h) = ^E\Z(x + h) - Z{x)\, 

and its natural estimator is defined as 

1 - 

m(h) = — } \zi{x) -Zi(x + h)\, (6) 
2n / — ' 

i=l 

where Zj(x ) is the realiza t ion o f the i th observed process at position x. This estimator is 



unbiased. 



Cooley et. al. 



( 120061 ) showed the relationship between the madogram and the 
extremal coefficient 8(h). If the Generalized Extreme Value shape parameter £ < 1, then 
the madogram m(h) and extremal coefficient 6(h) verify 

+ if£<land£^0 

o(h) = { exp frm\ if e = 0j 



where up = (l + and T(-) is the Gamma function. Note in particular that for unit- 

Gumbel margins (with £ = and a = 1), we have the simple relationship m(h) = log 6(h). 
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We will exploit this simple relationship by first transforming all margins of a max-stable 
process to unit-Gumbel (and not the usual unit-Frechet). This is easily done by taking the 
log of data with unit-Frechet margins. 

Thus assuming that the marginal parameters of the process are known, the estimator 
of the madogram is unbiased, and we have a closed-form expression for the madogram as a 
function of the underlying correlation p(h](f>), which is the target of our method. We can 
naturally define a residual as e(h) = rh(h) —log 9(h). Thus, for the Schlather model, plugging 
in equations ([2} and we obtain residuals 

I " f 

e(h) = — ^2 \ z i( x ) ~ z i( x + h ) \ ~ lo S { 1 
n i=i I 

The parameter value which minimizes the sum of squared residuals is the ordinary least 
squares estimator, equal to 

<Pols = argmin ^ e ( h ) 2 - ( 7 ) 

h 

The summary statistic S is chosen to be the ordinary least squares fit to the madogram, 
subject to the constraint that it be a valid madogram. Mathematically, this is 

S = \og{9(h;4> Ls)}- (8) 

The procedure for utilizing the summary statistic is as follows. For observed data Z, the 
madogram is estimated and the OLS fit is obtained using equation (j7j). Then the summary 
statistic S = s is computed using equation (|Sj). For each successive iteration of the approx- 
imate Bayesian computing algorithm, a simulated data set Z' is obtained from parameter 
cf)' ~ 7r(0). The madogram is estimated and an OLS fit to the madogram 5" = s' is obtained. 
What remains is some means of computing the distance between s and s' . We have chosen 
this as 

d(s,s') = J \s(h)-s'(h)\dh. 

The integral of the absolute differences between the two curves s and s' is computed numeri- 
cally, and taken as a measure of the distance between s and s'. The final step in approximate 



l-p(h-cj>) 



1/2 



13 



Madogram 




Distance 

Figure 2: Example of a madogram (solid line), estimate (points), and ordinary least squares fit (dotted line). 
The entire dotted line is the summary statistic, defined by parameter 4>ols- 

Bayesian computing is to accept 0' for the posterior if d(s, s') < e for some suitably chosen 
e. 

The output of this is a collection of M particles (/>[,..., 4>' M which are taken to be a sample 
from the approximated posterior. From this, we computed the correlation function p(h; (f>) 
evaluated for each <f' m . The analog of a posterior mean in this setting is the pointwise mean 
of all accepted functions, 

1 M 

W = ^E^^' ( 9 ) 

m=l 

We use the pointwise mean when evaluating the performance in a simulation study. 

3.2. The Pairwise Extremal Coefficient Method 

This approach is very similar to the preceding madogram approach, but instead of fitting a 
smooth curve to the madogram we fit the curve directly to the pairwise extremal coefficients. 
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We define the residual as e(h) = 9(h)— 9(h). Plugging in equations §2$) and (j3J), the parameter 
value which minimizes the sum of residuals is equal to 

<Pols = argmin^ ^ e ( h ) 2 - 
h 

The summary statistic 5* is chosen to be the ordinary least squares fit to the madogram, 
subject to the constraint that it be a valid madogram. Mathematically, this is 

S = 9(h;4> OLS ). (10) 

The remainder proceeds exactly as in the madogram method, using the summary shown in 
equation (jTUl) . 

3.3. The Tripletwise Extremal Coefficient Method 

Both the madogram approach and the pairwise extremal coefficient approach rely on pair s 



of locations. This is also true for the composite likelihood approach ( IPadoan et. al.l . 120101 ). 
A natural improvement is an approximate Bayesian computing method which moves beyond 
pairs and considers higher order k-tupl es. Th e use o f triplets was explored by 



Genton et. al. 



(120 111 ), but only for the Smith model (jSmithl . Il990l ). a small subset of max-stable processes 



that does not include the Schlather model. In this section we use the estimated triplet 
extremal coefficients from equation (jSj) as the basis for the summary statistic S(-), and thus 
utilize information from triplets in the estimation of Schlather max-stable processes. 

The number of unique sets of triplets in a set of data with D locations is (^*) = 
D ( D - 1 )( L> - 2 ) ; w hich grows quite rapidly as D increases. For example, with only D = 20 
locations we have 1140 unique triplets. This combinatorial explosion as D increases poses a 
problem for an approximate Bayesian computing approach. Higher dimensional summaries 
can only decrease the probability of acceptances, which may quickly leave an approach un- 
computable in any practical sense. On the other hand, the uncertainty in estimating a 
single triplet extremal coefficient using equation ([5]) can be quite large (as compared with 
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the known bounds [1,-D]), so there is a natural desire to group estimates into homogeneous 
groups and take averages to reduce the uncertainty in estimation. The idea then is to group 
the (^) triplets into K groups, which are ideally homogeneous within groups, heterogeneous 
across groups, and all such that K <C (^) . 

To reduce the dimension o f the summary, we group these (^) triplets into K groups using 



Ward's method (IWard 



19631 ). This method only requires a measure of distance between 
items, and the number of groupings. A triplet of locations is a triangle between 3 points, 
which produces 3 Euclidean distances A = (ai, a,2, 0.3). To measure the distance between two 
triplets A and B, we take 

dist(A, B) = min \ \ai — b^u\\ (11) 

i 

where 7r(i) is a permutation of {1, 2, 3}. Two sets of triplets A and B with identical lengths 
but different rotations, translation, and reflection would give a distance measure of zero. 
Such two sets should also have the same theoretical tripletwise extremal coefficients since 
the underlying field is isotropic and stationary. On the other hand, as two triangles become 
more dissimilar in their respective lengths, the distance measure will increase. Thus the 
clustering is based entirely on the geometry of the locations, and not on the actual estimates 
of the tripletwise extremal coefficients. 

For a data set with D locations, the first step is to compute an upper triangular dis- 
similarity matrix of size (^) by (^) which contains all distances computed using equation 
( TTTj) . In our simulations based on D = 20 locations, we chose to group the triplets into 
K = 100 clusters. This was selected to achieve a balance between maintaining within-group 
homogeneity and ensuring that enough triplets fall into each group to reduce variability in 
group averages. Ward's method is a hierarchical algorithm which first assigns each item to 
its own cluster, and then merges two clusters chosen to minimize the overall increase in the 
sum of squares (which is the sum of squared distances from each item to its cluster center). 
Thus, the sum of squares begins at zero, and Ward's method proceeds by merging items 
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which would result in the smallest increase. In our setting this clustering only needs to be 
done once, since all of the simulated draws will be at the same locations as the observed 
data. The requirement to enumerate all (^) triplets is a practical limitation to how large 
D may be. For values of D where the dissimilarity matrix is computable, it may be time 
consuming to run the clustering algorithm. 

The (^) triplet extremal coefficients are estimated for the observed data using equation 
([5]), and then values are averaged within the K clusters. The result is the summary of 
the observed data, s = (9i,...,9k). Next, we begin the approximate Bayesian computing 
procedure. Independent draws from the prior <p' ~ tt(0) are taken. The parameter space is 
$ = (0,oo) x (0,oo), except when a powered exponential is used in which case it is $ = 
(0, 00) x (0, 2]. For each draw from the prior, a max-stable process with unit-Frechet margins 
is simulated on the same locations and for the same number of years as the observed data. 
We estimate all triplet extremal coefficients for this simulated data, compute s 1 = (9[, 9' K ), 
and use the sum of the absolute deviations as the distance metric d: 

K 

d(s,s') = J2\sk-s' k \. (12) 
fc=i 

This entire process is repeated / times. The result is a collection of candidate parameter 
values = 1, which are then filtered as : di < e). This final filtration is an 

independent and identically distributed collection of M particles drawn from Tc(<f) | d(s, s') < 
e), which for very small e may be taken as an approximation to the true posterior. For 
each particle one can compute the spatial correlation function p(h; 4>' m ). Standard errors are 
obtained by simply regarding p(4>' m ) as an independent and identically distributed collection 
of draws from the posterior, and empirical 95% credible intervals can be constructed. 
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4. Simulation Study 

4-1. ABC Rejection Simulations 

We study the performance of the approximate Bayesian computing algorithm for es- 
timating the spatial dependence of a Schlather process with Whittle-Matern correlation 
p(c\ = l,c 2 ,z/). Simulations were conducted in R. We specified uniform, independent priors 
on [0,10] for the range C2 and smooth v parameters. This nicely spans the range of possible 
dependence functions on the space X (see Figure [3]), and is consistent with the preference 
for minimally informative priors. While this prior may not be the most efficient choice, it 
does suffice to show the advantages of approximate Bayesian computing over the composite 
likelihood approach. We make the comparison using mean square error as our measure of 
performance. The simulations were all carried out for n = 100 years of data at D = 20 
locations whose locations were drawn from a uniform distribution on a 10 by 10 grid. 

For each dataset we estimated the spatial dependence using both the composite likelihood 
approach and the approximate Bayesian computing approach shown in equation (Q. Figure 
|3] shows an example. Approximate Bayesian computing was done with 1=1, 000, 000 draws. 
Due to the substantial computing time needed, we ran the simulations in parallel on 50 
nodes on a research computing cluster, with each node only responsible for simulating 20,000 
datasets. In parallel, total computing time for one dataset in one model was around 8 hours 
for the madogram method (which contains a numeric optimization step for each iteration), 
but often faster for the ABC pairwise and ABC tripletwise approaches. Given this constraint, 
we chose to limit the number of repetitions to only 5 replications for each model. In all there 
were 6 models, therefore 30 simulation runs (in the next subsection we discuss a faster 
implementation with more simulations). 

The output from each simulation was filtered as : di < ep), where ep is the 0.02% 
percentile of dj. This ensures exactly 200 particles are accepted for the approximate posterior 
distribution for each simulation. We found that the spacings of the (D = 20) locations can 
shift the overall distribution of di, so for identical model specifications one may need different 



18 



thresholds of e to ensure enough particles are accepted. Thus, it is better not to specify a 
fixed threshold e but instead set as a very low percentile. 

We judged relative performance of the methods based on estimating the true correlation 
p(h; 4>true), not in estimating the true parameter 4>true- Two very different parameters 
<pi and <p2 can produce similar correlations p(h; <fti) ~ p(h; 02 ) (by moving the range and 
smooth parameters in opposite directions, for example). A diffuse posterior distribution of 
can actually produce a tight posterior distribution of p(h;<f)); we have observed that the 
ABC method shows this behavior. Thus, the comparison is made between the true correla- 
tion function p(h; 4>true) and estimated correlation function under the various approaches: 
p{h) = p(h; 4>mcle) for the composite likelihood method, and for the ABC approaches the 
pointwise posterior mean in equation (jUJ). 

Mean square error was computed as a numeric approximation to 



Taking the interval over the range {h > : p(h; 4>true) > 0.1} focuses the comparison on 
the regions of higher spatial correlation, which are of greater interest. If we computed MSE 
over the entire range of p(-), results of this paper would not differ in any meaningful way. 
We stress that the pointwise mean in equation (Q is used only to compare the ABC methods 
with the composite likelihood approach in the simulation study. When the ABC approach 
is used alone in practice, one would use the full approximate posterior to handle prediction, 
credible intervals, and assess uncertainty. The application at the end of this manuscript 
shows an example. 

Results are shown in Table 1. The first two columns show the performance of the two 
pairwise ABC methods, the madogram and extremal pairwise coefficient methods, followed 
by the ABC tripletwise and MCLE approaches. The average MSE over 5 runs is lowest for 
the ABC tripletwise in 5 of 6 models, and essentially tied with composite likelihood in the 
sixth. Within each model specification, having only five repetitions means standard error 




(13) 
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Figure 3: Left: Span of models included in simulation study. Models range from short-range dependence 
(A) to long-range dependence (F) on the scale of simulated data. In the three models labeled A, B, and C 
(solid lines) the adaptive ABC tripletwise approach outperformed MCLE. Right: Example of approximate 
Bayesian computing (dashed line) and maximum composite likelihood (dotted line) estimates from one model 
run, as compared to the true model (solid line). 
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Table 1: Mean Square Error using equation (fT3|) for both the composite likelihood and three approximate 
Bayesian computing methods. Reported values are averages from 5 simulations of each of the six models. 
Standard error estimates are shown in brackets. The values reported have order of magnitude is 10 -4 . 
The final column shows the relative reduction in MSE when using the approximate Bayesian computing 
tripletwise method, as compared to the composite likelihood method. 
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ABC 


ABC 


Composite 


Reduction: ABC Tripletwise 


Model 




Madogram 


Pairwise 


Tripletwise 


Likelihood 


vs. Composite Likelihood 


A (c 2 = 0.5,i/ = 


= 1) 


99 [68] 


45 [27] 


15 [9] 


26 [13] 


40.1% 


B (C2 = l,v = 


1) 


207 [81] 


262 [102] 


125 [77] 


146 [121] 


14.5% 


C (02 = 1, V = 


3) 


314 [104] 


272 [37] 


103 [46] 


140 [43] 


26.6% 


D (02 = 3, v = 


1) 


98 [38] 


284 [155] 


99 [69] 


163 [74] 


39.4% 


E (c2 = 3, v = 


3) 


385 [208] 


555 [241] 


287 [100] 


283 [105] 


-1.3% 


F (c 2 = 5, v = 


3) 


269 [92] 


645 [174] 


70 [41] 


485 [374] 


85.6% 
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estimates remain large, and it is difficult to make firm conclusions. However, when viewed 
as a whole, these 30 runs do show evidence in favor of the ABC tripletwise approach. The 
ABC tripletwise method outperformed the ABC pairwise method in 25 out of 30 of the 
model runs, and it outperformed the composite likelihood approach in 19 out of 30 runs. 
The ABC tripletwise method also gave the best estimate out of the four methods in 16 out 
of 30 runs, whereas the composite likelihood was best in only 8 of 30 runs. We found the 
greatest correlation in performance between the ABC madogram and ABC pairwise methods 
(0.613), as expected, since these two methods are the most similar and utilize essentially the 
same information in the summary statistics. These findings motivated the larger simulation 
study discussed in the next section. 

4-2. Adaptive ABC Simulations 

Motivated by the promising results for the ABC tripletwise approach, we carried out 
a second simulation study using a more computationally efficient adaptive approximate 
Bayesian computing (AABC) algorithm to more closely compare the performance of the 
ABC tripletwise and MCLE approaches. The aim of this simulation study was to increase 
computational effici ency and the number of simulations per model, but avoid the use of 



parallel computing. iBeaumont et. al.l (120091 ) describe the adaptive algorithm in detail. It 



is a sequential algorithm which uses ABC rejection sampling to produce a first approxima- 
tion, and then re-samples from this first approximation for second round of ABC rejection 
sampling to produce a subsequent approximation. This allows the second stage of ABC to 
sample more efficiently, thus increasing the efficiency of the a l gorith m. We implemented the 



AABC algorithm exactly as described by IBeaumont et. al.l (120091 ) . Specifically, the steps 
were to: 

1. Run the ABC rejection algorithm exactly as described in the section 4.1 but with 
I = 100,000 simulations to produce a first approximation 4>^\ (f)^ (the J = 500 
particles filtered as : dj < ep), where ep is the 0.5% percentile of dA 
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2. Compute f2 as twice the empirical variance of (f>\ , (pj . 

(a) Resample a particle <p* from , 

(b) Mutate using kernel K{<ft \ (j)*) = N(<p\ ft) 

(c) Simulate Z' | 0', compute summary s' and distance d(s, s') as before 

(d) (Repeat 100,000 times) 

3. Filter the 100,000 particles as : di < ep), where ep is the 0.5% percentile, ensuring 

(2) 

exactly 500 particles are accepted. Call these (j) m ,m = 1, ...,500. 

(2) 

4. For accepted particle (j) m compute weight 



VL evaluated at the point (pin. 

The only additional consequence of this adaptive algorithm is that we have produced a 
weighted sample from the approximate posterior, and thus have to modify our estimate of 
the correlation function from equation (Q to now be 



Results for MCLE and the AABC approximation are shown in Table 2. The more efficient 
AABC approach could be run without the use of any parallel computing, freeing up nodes, 
which allowed for 30 runs in each model (thus 6 ■ 30 = 180 simulations in total). We chose 
an initial sampling of 100,000 and a re-sampling of 100,000 to keep the computational cost 
to around 8 hours per run. This means the performance of AABC as shown in Table 2 is 
roughly what a user might expect when analyzing a dataset on a single computer in a single 
day. The AABC method resulted in a lower MSE for the three short range processes (A, B, 
and C) but a larger MSE for the three longer range processes (D, E, and F). Clearly, the 
adaptive ABC approach was not shown to outperform MCLE for all of the models, but there 
is a clear statistical benefit for the short-range processes. We discuss this more in section 6. 



1 





(14) 



m=l 
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Table 2: Mean Square Error of the pointwise mean correlation function estimates (taken with respect to 
the true correlation function) for the MCLE and AABC methods. Reported values are averages from 30 
simulations of each of the six models. Standard error estimates are shown in brackets. The values reported 
have order of magnitude is 10~ 4 . 
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5. Application to US Temperature Data 

We illustrate the methodology of the ABC tripletwise extremal coefficient approach on 
US temperature data in northern Texas, with the aim of modeling the acreage of cotton at 
risk of an October freeze. Data on crop losses taken from the United States Department 
of Agriculture Risk Management Agency shows that between 1989 and 2008, Texas cotton 
losses caused by freezing totaled $108,478,787. Of these losses, fully 67.8% ($73,642,461) 
occurred in the month of October. 

The data are daily minimum temperature data taken from 30 gauged sites centered 
around northern Texas in the United States, freely obtained from the Gl obal Historical Cli- 



mato logy Network (http:/ /www. ncdc.noaa.gov/oa/climate/ghcn-daily/) (IPeterson and Vosd . 
19971 ). All sites are between 104 and 98 degrees west longitude and 31 to 37 degrees north 
latitude. We required stations have at least 90% of daily October values for at least 90% of 
the years between 1935 and 2009. This region is shown in Figure HI Also shown are the 58 
counties which jointly comprise the four Texas agricultural districts responsible for 82.3% of 
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Figure 4: Locations of the 30 gauged sites and 58 counties in the primary cotton growing region of Texas. 

all Texas upland cotton acreage (in 2009). For each year and location, we took the minimum 
daily temperature in the month of October. The aim of the analysis was to estimate the 
spatial dependence of the process through p(-), and use this information to estimate the 
number of acres of cotton at risk of an early freeze through simulations. 

First we transformed data at each location to unit-Frechet by fitting the marginal univari- 
ate data to the Generalized Extreme Value distribution and obtained maximum likelihood 
estimates of the location-specific Generalized Extreme Value parameters fi(x), cr(x), £(x) for 
Xx,...,X3o £ X. These estimates are shown in Figure [51 The location parameter and scale 
parameters were both influenced heavily by spatial location, whereas the shape parameter 
showed no discernible relationship to spatial placement. 

Next we estimated the tripletwise extremal coefficients for the 4060 unique triplets us- 
ing equation We clustered these into K = 100 groups using Ward's method as de- 
scribed in Section 3. Our summary statistic for the data was the average within K groups, 
s = 6*100) for K = 100 groups. We considered the Whittle-Matern, Cauchy, and 

powered exponential correlation functions as possible models. Using the standard compos- 
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Figure 5: Estimates of the Generalized Extreme Value parameters at each of the 30 gauged locations. The 
location fj,(x) and scale cr(x) show spatial dependence, whereas the shape parameter £(x) does not. Heavy 
lines are US state borders. 
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are US state borders. 
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ite likelihood information criteria for model selection ( jPadoan et. al.l . 120101 ) . we found the 
Whittle-Matern to have the best fit, and selected the uniform priors for the range C2 and 
smooth v as independent uniforms U[0,10] (see section 6 for more on model selection and 
ABC). For this model, this prior allows for a full range of spatial processes on the scale of 
the observed data in X. We drew 1,000,000 draws from the prior, and ran the approximate 
Bayesian computing algorithm. 

The threshold e was set as the 0.02% percentile of dj. This ensured exactly 200 particles 
were accepted for the approximate posterior. The pointwise mean of these 200 accepted 
functions p(h; </>') was taken as the estimate of the correlation function. Approximate point- 
wise 95% credible intervals were estimated as the pointwise 2.5% and 97.5% quantiles, taken 
at each h. This is shown in Figure (j7]). 

The primary aim of this application was to estimate the number of acres of cotton at 
risk of an October freeze. To extrapolate the max-stable process to the 58 ungauged county 
centroids, we obtained the county centroids from the US Census Bureau 

(http:/ /www.census.gov/geo/www/cenpop/county/coucntr48.html), and extrapolated location 
specific Generalized Extreme Value parameters for each of these by using the standard spa- 
tial Kriging in the R package fields. We found estimates of the location /j,(x) scale ct(x) 
parameters varied with location, whereas the shape parameter £(x) showed no discernible 
relationship to spatial location, shown in Figure [5] Thus, for an ungauged location x' we 
used Kriged values of /i(x') and cr(x'), but took = ^ Y^=i £( x i)> ^ ne average of the 30 
estimates £,(x) from the gauged locations (thus each £,(x') was the same for all x'). Kriged 
values for the location and scale are shown in Figure |6j 

We generated simulations from the fitted model not with the intent of matching them to 
observed data, but rather to calculate a distribution of cotton losses in hypothetical future 
years under the same climate. We sampled <fr (with replacement) from the approximate pos- 
terior and simulated a max-stable process with unit-Frechet margins and Whittle-Matern 
correlation at the 58 ungauged county centroids. We used the Kriged location-specific Gen- 
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eralized Extreme Value parameters to transform this back to a temperature scale at each of 
the 58 county centroids. If the minimum October temperature of the county centroid fell 
below the chosen threshold, we assumed all acres within the county were exposed to the tem- 
perature event. Figure (IE]) shows the log-density of exposed counties for 10,000 simulations 
at various temperature thresholds. This method of simulation based on the approximate 
posterior very naturally incorporates parameter uncertainty into the quantity of interest. 

Of particular interest to the insurance and agricultural communities are the number 
of simulations resulting in intermediate exposure, say between 0.5 and 3.5 million acres 
(contrasted with the all-or-nothing scenarios of or the full 4.1 million acres exposed). We 
find 49.8, 38.5, and 26 percent fall into this range for thresholds of 32, 30, and 28 degrees 
Fahrenheit. This provides evidence that these counties are not completely dependent with 
respect to an October freeze, and lends additional support to the idea that it may be possible 
to offer financial or insurance products to protect against crop losses caused by this freeze 
peril. The method shown here allows for realistic estimation of a distribution of insurance 
losses, going beyond the empirical distribution calculated from past data. This information 
could be useful to an actuary interested in calculating the expected payout associated with 
an insurance policy. 



6. Discussion 



One area which we have not discussed is model selection. When fitting max-stable pro- 
cesses, there are choices of both the structure of the max-stable model (Schlather vs. al- 
ternatives) and of the spatial correlation fun ction within the model. Ideally, we would like 
a method of deciding amongst those models. IVarin and Vidonil (120051 ) introduced the com- 
posite likelihood information criteria (CLIC), which works well with a composite likelihood 
framework, but this does not have a logical Bayesian interpretation. The typical approach 



used in approximate Bayesian computing has been to obtain app roximate Bayes 



which is the more logical choice within the Bayesian context. Both 
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Figure 7: Left: approximate Bayesian computing estimate of spatial correlation of the max-stable process, 
with pointwise 95% credible interval estimates shown as dotted lines. Right: Approximate posterior of the 
range and smooth parameters. 



and 



Blum! (120101 ) followed this approach. However, recent literatur e has cast a shadow 



over the quality of ap pr oximating Bay e s fact ors using ABC methods (IRobert et. al. 



2010 



Sisson and Fan 



20101 ). 



Robert et. al. 



(120101 ) in particular found that even with sufficient 
statistics for two models under consideration, the Bayes factor obtained from ABC cannot 
always be trusted. In the more realistic setting with in-sufficient statistics and thus wider 
loss of information, there is even less reason to trust an ABC Bayes factor. We are left 
without a clear ABC model selection procedure at this time, and in the application above, 
we used the standard CLIC as a first step to determine which model might be best, and then 
proceeded with ABC. 

Approximate Bayesian computing is shown to outperform MCLE in a statistically signifi- 
cant way only for the three short-range max-stable processes. Still, the results are significant 
and of value. We have demonstrated there exists a class of max-stable processes for which 
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Figure 8: A graph showing the number of acres of cotton exposed to minimum temperature event, taken from 
10,000 simulations. The temperature events are minimum October temperature falling below 32 degrees F 
(dotted line), 30 degrees F (dashed line), and 28 degrees F (solid line). 
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an ABC algorithm outperforms the competing MCLE approach. Furthermore, the simu- 
lations in this paper are only exploratory, and in no way exhaust the full range of ABC 
implementations possible. There are open questions as to how quartets, quintets, or higher 
order fc-tuples may be incorporated into an improved summary statistic, and also there are 
open questions as to how more efficient ABC samplers could allow the threshold e to be 
reduced and thus improve the ABC posterior approximation. We are continuing to work 
on these questions, but feel it is a positive development to show an implementation of ABC 
which outperforms MCLE for short-range processes. This implementation should serve as a 
foundation on which improved ABC implementations can be built. 

The computational cost of the ABC tripletwise method is appreciably higher than the 
competing composite likelihood method. However, for those comfortable with parallel or 
adaptive computing, the cost is measured in hours, not days or weeks. Despite this drawback, 
the ABC approach offers several advantages. The simulation study has provided sufficient 
evidence that ABC tripletwise method can result in a lower mean square error when compared 
to the composite likelihood method. The method can in principle be extended beyond triplets 
to /c-tuples for any k > 3, and as computational cost falls such implementations should 
become easier and faster. And finally, the method very naturally incorporates parameter 
uncertainty into prediction, which is a central purpose of the field of extremes. 
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